Kernel Ridge Regression
This article describes both the general usage of the Kernel Ridge Regression and its usage for recommenders. General description Let X be an n*d matrix of explanatory variables, n is the number of observations, d is the number of explanatory variables, X_{ij} is j-th element of the i-th observation. Let y be the vector of target values. Let I be an identity matrix of the relevant dimension. The ridge regression is : \hat y=X \beta : \beta = (X^TX + \lambda I)^{-1} X^T y where \lambda is the parameter of ridge regression. Note that : (X^TX + \lambda I)X^T = X^TXX^T + \lambda X^T = X^T(XX^T+\lambda I) Multiplying each part of the equation by (X^TX + \lambda I)^{-1} at the left and (XX^T+\lambda I)^{-1} y at the right, we have : X^T (XX^T + \lambda I)^{-1} y = (X^TX + \lambda I)^{-1} X^T y Recalling the value of \beta , we have : X^T (XX^T + \lambda I)^{-1} y = \beta What do we have from this? Let : \alpha = (XX^T + \lambda I)^{-1} y Then : \beta = X^T \alpha For a novel observation (x', y') , the prediction is : \hat y^\prime = \beta^T x' = y^T (XX^T + \lambda I)^{-1} X x' Define now : K_{ij} = x_i x_j (K is for kernel) : \kappa_i = x_i x' where x_i is i-th row of the matrix X. Then : \hat y^\prime = y^T (K + \lambda I)^{-1} \kappa Now \hat y^\prime depends on K and \kappa , rather than on X. If we perform a transformation x \rightarrow \Phi(x) , then what we need is to recalculate K and \kappa . Instead of explicit transformations, we can replace K_{ij} = x_i x_j , \kappa_i = x_i x' by : K_{ij} = f(x_i,x_j) : \kappa_i = f(x_i,x') where f() is some function. KRR for recommenders Notation Below: * The letters: ** u denotes a user ** i denotes an item ** U denotes number of users ** I denotes number of items ** f() denotes an arbitrary function ** r denotes rating (or anything else which a recommender try to forecast, i. e. probability to select an item) ** p and q denotes user and item feature vector, respectively (see explanation in text) * r_{ui} is the rating given by user u to item i * \hat{r}_{ui} is the predicted rating given by user u to item i * R(u) is the set of all items rated by the user u, while the rating is known. * I(u,l) is l -th item of R(u) Description First, the matrix factorization with biases is performed. This means to determine user features p_1, ..., p_U , item features q_1, ..., q_I , user biases b_1, ..., b_U 's, item biases b^\prime_1, ..., b^\prime_I , global bias \mu , such that the predictions \hat{r}_{ui} = p_u^{T}q_i + b_u + b_i + \mu would approximate real ratings r_{ui} . This is performed by gradient descent with relaxation or another algorithm. See Matrix factorization for more information. After the matrix factorization is finished, user features are dropped. Now that for each user u, minimizing the expression : \sum_{i\in R(u)} (r_{ui}-p_u q_i)^2 with respect to p constitutes a linear regression problem. Instead of it, however, we perform the kernel ridge regression described in the previous section: : \hat{r}_{ui} = {r^{(u)}}^T (K^{(u)} + \lambda I)^{-1} \kappa_u(i) where * r^{(u)} is the vector of ratings for this user in the training set, r^{(u)}_l = r_{u,I(u,l)} * K^{(u)} is the matrix defined as K^{(u)}_{lm} = f \left( { q_{I(u,l)} q_{I(u,m)} \over \lVert q_{I(u,l)} \rVert \lVert q_{I(u,m)} \rVert } \right) * \kappa^{(u)}(i) is the vector defined as \kappa^{(u)}_l = f \left( { q_{I(u,l)} q_i \over \lVert q_{I(u,l)} \rVert \lVert q_i \rVert } \right) For the Netflix contest, good choices of function f were * f_1(x) = \exp(c(x-1)) and, slightly worse, * f_2(x) = \begin{cases} \exp \left( c \left( 1 - {1 \over x} \right) \right) & x>0 \\ 0 & \text{otherwise} \end{cases} Here c is the only model parameter. For f_1 , good values of c were 4 and 5. For f_2 , the best value of c was 4. External links * [http://www.cs.uic.edu/~liub/KDD-cup-2007/proceedings/Regular-Paterek.pdf Improving regularized singular value decomposition for collaborative filtering], by Arkadiusz Paterek, section 3.6 Category:Analytics Category:Comment methods Category:Recommenders